智能论文笔记

元学习考虑了学习高效学习过程的问题，可以利用其过去的经验来准确解决新任务。然而，元学习的效果是至关重要的，这取决于可用于训练的任务的分布，并且通常认为这是已知的先验或由有限的监督数据集构建。在这项工作中，我们的目标是通过考虑从未标记的文本自动提出的自我监督任务来提供元学习的任务分布，以在NLP中启用大规模的元学习。我们通过考虑任务多样性，困难，类型，域和课程的重要方面，并调查它们如何影响元学习表现的重要方面，设计多个自我监督任务分布。我们的分析表明，所有这些因素有意义地改变任务分配，一些突起在荟萃学习模型的下游的下游显着改进。凭经验，20下游任务的结果显示出几次学习的显着改善 - 在以前的无监督的元学习方法增加到+ 4.2％的绝对精度（平均值），并与换行符的监督方法相比表现。

translated by 谷歌翻译

First De-Trend then Attend: Rethinking Attention for Time-Series Forecasting

Xiyuan Zhang , Xiaoyong Jin , Karthick Gopalswamy , Gaurav Gupta , Youngsuk Park , Xingjian Shi , Hao Wang , Danielle C. Maddix , Yuyang Wang

分类：机器学习

2022-12-15

Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting in recent years. In addition to learning attention in time domain, recent works also explore learning attention in frequency domains (e.g., Fourier domain, wavelet domain), given that seasonal patterns can be better captured in these domains. In this work, we seek to understand the relationships between attention models in different time and frequency domains. Theoretically, we show that attention models in different domains are equivalent under linear conditions (i.e., linear kernel to attention scores). Empirically, we analyze how attention models of different domains show different behaviors through various synthetic experiments with seasonality, trend and noise, with emphasis on the role of softmax operation therein. Both these theoretical and empirical analyses motivate us to propose a new method: TDformer (Trend Decomposition Transformer), that first applies seasonal-trend decomposition, and then additively combines an MLP which predicts the trend component with Fourier attention which predicts the seasonal component to obtain the final prediction. Extensive experiments on benchmark time-series forecasting datasets demonstrate that TDformer achieves state-of-the-art performance against existing attention-based models.

translated by 谷歌翻译

我们旨在通过引入全面的分布式深度学习（DDL）探索器来解决此问题，该研究人员可以确定DDL在公共云上运行时遭受的各种执行“失速”。我们已经通过扩展先前的工作来估算两种类型的通信失速 - 互连和网络摊位来实现剖面。我们使用Profiler培训流行的DNN模型来表征各种AWS GPU实例，并列出了用户做出明智决定的优势和缺点。我们观察到，较昂贵的GPU实例可能不是所有DNN型号的性能最多，并且AWS可能会在次优的硬件互连资源分配次优。具体而言，与单个实例的培训相比，机内互连可以引入高达90％的DNN培训时间和网络连接的实例的通信开销，而与网络连接的实例可能会遭受高达5倍的速度。此外，我们对DNN宏观特征的影响进行建模，例如层的数量和通信摊位上的梯度数量。最后，我们为用户提出了一个基于衡量的建议模型，以降低DDL的公共云货币成本。

translated by 谷歌翻译

发现自闭症谱系障碍（ASD）是各种职业治疗师之间的主要问题。这种神经开发障碍的最重要挑战在于分析和探索其早期发展阶段的儿童各种症状的事实。此类早期识别可以提高治疗师和临床医生，以提供适当的辅助支持，使儿童引起独立的生活。儿童所感知的面部表情和情感可能导致自闭症的早期干预。在这方面，纸张实施识别基本面部表情并在时间变体因素探索他们的情绪。通过使用CNN识别的68个地标点在正面上绘制的具有由称为RCNN-FER系统形成的预测网络，通过CNN识别的面部表达来分析情绪。本文采用R-CNN，以提高准确性和性能的优势，随着时间复杂性降低，以预测情绪为文本网络分析。在与为自闭症社会提供的此类标识的简单机器学习模型相比，这些文件证明了识别自闭症儿童的情感更好的准确性。

translated by 谷歌翻译